ShRkC: Shard Rank Cutoff Prediction for Selective Search
نویسنده
چکیده
In search environments where large document collections are partitioned into smaller subsets (shards), processing the query against only the relevant shards improves search efficiency. The problem of ranking the shards based on their estimated relevance to the query has been studied extensively. However, a related important task of identifying how many of the top ranked relevant shards should be searched for the query, so as to balance the competing objectives of effectiveness and efficiency, has not received much attention. This task of shard rank cutoff estimation is the focus of the presented work. The central premise for the proposed solution is that the number of top shards searched should be dependent on – 1. the query, 2. the given ranking of shards, and 3. on the type of search need being served (precision-oriented versus recall-oriented task). An array of features that capture these three factors are defined, and a regression model is induced based on these features to learn a queryspecific shard rank cutoff estimator. An empirical evaluation using two large datasets demonstrates that the learned shard rank cutoff estimator provides substantial improvement in search efficiency as compared to strong baselines without degrading search effectiveness.
منابع مشابه
Improving Shard Selection for Selective Search
The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for the query, directly impacts Selective Search performance. We thus investigate three new approaches ...
متن کاملEfficient and Effective Large-scale Search
Search engine indexes for large document collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. For organizations with modest computing resources the high query processing cost of this exhaustive search setup can be a deterrent to working with ...
متن کاملBalancing Precision and Recall with Selective Search
This work revisits the age-old problem of balancing search precision and recall using the promising new approach of Selective Search which partitions the document collection into topic-based shards and searches a select few shards for any query. In prior work Selective Search has demonstrated strong search precision, however, this improvement has come at the cost of search recall. In this work,...
متن کاملDoes Selective Search Benefit from WAND Optimization?
Selective search is a distributed retrieval technique that reduces the computational cost of large-scale information retrieval. By partitioning the collection into topical shards, and using a resource selection algorithm to identify a subset of shards to search, selective search allows retrieval effectiveness to be maintained while evaluating fewer postings, often resulting in 90+% reductions i...
متن کاملComparison of Map@500 Scores (cw09-a) for Rank-s and Taily Instances. a C R O N Y M S Csi Centralized Sample Index Wand Weighted and Xii
Selective search is a modern distributed search architecture designed to reduce the computational cost of large-scale search. Selective search creates topical shards that are deliberately contentskewed, placing highly similar documents together in the same shard. During query time, rather than searching the entire corpus, a resource selection algorithm selects a subset of the topic shards likel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015